Lecture 9 : Linear Bandits ( Part II )
نویسنده
چکیده
There exist an elliptical confidence region for the w, as described in the following theorem Theorem 1. ([2], Theorem 2) Assuming ‖w‖ ≤ √ d and ‖xt‖ ≤ √ d, with probably 1− δ, we have w ∈ Ct, where Ct = { z : ‖z − ŵt‖Mt ≤ 2 √ d log Td δ } For any x ∈ A, we define UCBx,t = maxz∈Ct z′x if w ∈ Ct (which holds with high probability). At each time, the UCB algorithm then simply picks the bandit with the highest UCB given all previous observation. xt = arg max x∈A UCBx,t−1 = arg max x∈A,z∈Ct−1 x′z
منابع مشابه
Lecture 9 : ( Semi - ) bandits and experts with linear costs ( part I )
In this lecture, we will study bandit problems with linear costs. In this setting, actions are represented by vectors in a low-dimensional real space. For simplicity, we will assume that all actions lie within a unit hypercube: a ∈ [0, 1]d. The action costs ct(a) are linear in the vector a, namely: ct(a) = a · vt for some weight vector vt ∈ Rd which is the same for all actions, but depends on t...
متن کاملLecture 2 : Bandits with i . i . d rewards ( Part II )
So far we’ve discussed non-adaptive exploration strategies. Now let’s talk about adaptive exploration, in a sense that the bandit feedback of different arms in previous rounds are fully utilized. Let’s start with 2 arms. One fairly natural idea is to alternate them until we find that one arm is much better than the other, at which time we abandon the inferior one. But how to define ”one arm is ...
متن کاملA Survey on Contextual Multi-armed Bandits
4 Stochastic Contextual Bandits 6 4.1 Stochastic Contextual Bandits with Linear Realizability Assumption . . . . 6 4.1.1 LinUCB/SupLinUCB . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.1.2 LinREL/SupLinREL . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.1.3 CofineUCB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.1.4 Thompson Sampling with Linear Payoffs...
متن کاملAsymptotic optimal control of multi-class restless bandits
We study the asymptotic optimal control of multi-class restless bandits. A restless bandit is acontrollable process whose state evolution depends on whether or not the bandit is made active. Theaim is to find a control that determines at each decision epoch which bandits to make active in orderto minimize the overall average cost associated to the states the bandits are in. Sinc...
متن کاملAsymptotically optimal priority policies for indexable and non-indexable restless bandits
We study the asymptotic optimal control of multi-class restless bandits. A restless bandit isa controllable stochastic process whose state evolution depends on whether or not the bandit ismade active. Since finding the optimal control is typically intractable, we propose a class of prioritypolicies that are proved to be asymptotically optimal under a global attractor property an...
متن کامل